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framework for resource selection 
Luo Si, Jamie Callan 

November 2004 Proceedings of the thirteenth ACM international conference on 
Information and knowledge management CIKM '04 

Publisher: ACM Press 

Full text available: ^] pdf(1 74.30 KB) Additional Information: full citation , abstract , references , index terms 

This paper presents a unified utility framework for resource selection of distributed text 
information retrieval. This new framework shows an efficient and effective way to infer 
the probabilities of relevance of all the documents across the text databases. With the 
estimated relevance information, resource selection can be made by explicitly optimizing 
the goals of different applications. Specifically, when used for database recommendation, 
the selection is optimized for the goal of high-rec ... 



Keywords: distributed information retrieval, resource selection 



Shape-based retrieval and analysis of 3D models 
Thomas Funkhouser, Michael Kazhdan 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
•04 

Publisher: ACM Press 

Full text available: ^| pdf(12.56 MB) Additional Information: full citation , abstract 

Large repositories of 3D data are rapidly becoming available in several fields, including 
mechanical CAD, molecular biology, and computer graphics. As the number of 3D models 
grows, there is an increasing need for computer algorithms to help people find the 
interesting ones and discover relationships between them. Unfortunately, traditional text- 
based search techniques are not always effective for 3D models, especially when queries 
are geometric in nature (e.g., find me objects that fit into thi ... 

Evaluating collaborative filtering recommender systems 

Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, John T. Riedl 

January 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue l 

Publisher: ACM Press 

Full text available* Additional Information: full citation , abstract , references , citings , index 
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Recommender systems have been evaluated in many, often incomparable, ways. In this 
article, we review the key decisions in evaluating collaborative filtering recommender 
systems: the user tasks being evaluated, the types of analysis and datasets being used, 
the ways in which prediction quality is measured, the evaluation of prediction attributes 
other than quality, and the user-based evaluation of the system as a whole. In addition to 
reviewing the evaluation strategies used by prior researchers ... 

Keywords: Collaborative filtering, evaluation, metrics, recommender systems 
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The SIFT information dissemination system 
Tak W. Yan, Hector Garcia-Molina 

December 1999 ACM Transactions on Database Systems (TODS), volume 24 issue 4 
Publisher: ACM Press 

Full text available* -fS Ddf(220 77 KB) Additional Information: full citation , abstract , references , citings , index 
* terms 

Information dissemination is a powerful mechanism for finding information in wide-area 
environments. An information dissemination server accepts long-term user queries, 
collects new documents from information sources, matches the documents against the 
queries, and continuously updates the users with relevant information. This paper is a 
retrospective of the Stanford Information Filtering Service (SIFT), a system that as of 
April 1996 was processing over 40,000 worldwide subscriptions and ov ... 

Keywords: Boolean queries, dissemination, filtering, indexing, vector space queries 



Special issue: Game-playing programs: theory and practice 
M. A. Bramer 

April 1982 ACM SIGART Bulletin, Issue 80 
Publisher: ACM Press 

Full text available: |g | pdf(9.23 MB) Additional Information: full citation , abstract 

This collection of articles has been brought together to provide SIGART members with an 
overview of Artificial Intelligence approaches to constructing game-playing programs. 
Papers on both theory and practice are included. 



6 Navigating in information spaces: Information foraging models of browsers for very j 

lar ge document spaces 
Peter Pirolli, Stuart K. Card 

May 1998 Proceedings of the working conference on Advanced visual interfaces 

Publisher: ACM Press 

Full text available: ^ pdf(4.29 MB) Additional Information: full citation , abstract, references , citin gs 

Information Foraging (IF) Theory addresses user strategies and technology for seeking, 
gathering, and using on-line information. We present IF-based models and evaluations of 
two interfaces: the Scatter/Gather browser for large document collections, and the 
Butterfly interface for surfing the citation link structure of scientific literatures. A 
computational cognitive model, ACT-IF, models observed users by assuming that they 
have heuristics that optimize their information foraging behavior in a ... 

Keywords: cognitive models, information foraging theory, information retrieval 
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query concepts 
Edward Chang, Beitao Li 

October 2003 ACM Transactions on Information Systems (TOIS), Volume 21 issue 4 
Publisher: ACM Press 

Full text available- f£) odffl 34 MB) Additional Information: full citation, abstract, references, citings, index 
' tte*—^ terms 

Specifying exact query concepts has become increasingly challenging to end-users. This is 
because many query concepts (e.g., those for looking up a multimedia object) can be 
hard to articulate, and articulation can be subjective. In this study, we propose a query- 
concept learner that learns query criteria through an intelligent sampling process. Our 
concept learner aims to fulfill two primary design objectives: (1) it has to be expressive in 
order to model most practical query concepts and (2) i ... 

Keywords: Active learning, data mining, query concept, relevance feedback 



Special issue: Al in engineering 

D. Sriram, R. Joobbani 

April 1985 ACM SIGART Bulletin, issue 92 

Publisher: ACM Press 

Full text available: ^[ pdf(8,79 MB) Additional Information: full citation , abstract 

The papers in this special issue were compiled from responses to the announcement in 
the July 1984 issue of the SIGART newsletter and notices posted over the ARPAnet. The 
interest being shown in this area is reflected in the sixty papers received from over six 
countries. About half the papers were received over the computer network. 

Boosting for document routing 

Raj D. Iyer, David D. Lewis, Robert E. Schapire, Yoram Singer, Amit Singhal 

November 2000 Proceedings of the ninth international conference on Information and 

knowledge management 
Publisher: ACM Press 

Full text available: ^ pdf(263.57 KB) Additional Information: full citation , references , citings , index terms 



Keywords: boosting, ranking, routing, supervised learning, text representation 



10 Text categorization: Using asymmetric distributions to improve text classifier 
^ probability estimates 
V Paul N. Bennett 

July 2003 Proceedings of the 26th annual international ACM SIGIR conference on 
Research and development in informaion retrieval 

Publisher: ACM Press 

Full text available: fg| pdf(281 97 KB) Additional Information: full citation , abstract , references , citings , index 

! terms 

Text classifiers that give probability estimates are more readily applicable in a variety of 
scenarios. For example, rather than choosing one set decision threshold, they can be used 
in a Bayesian risk model to issue a run-time decision which minimizes a user-specified 
cost function dynamically chosen at prediction time. However, the quality of the 
probability estimates is crucial. We review a variety of standard approaches to converting 
scores (and poor probability estimates) from text classifi ... 

Keywords: active learning, classifier combination, cost-sensitive learning, text 
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11 Comparing the performance of collection selection algorithms 
Allison L Powell, James C. French 

October 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 4 
Publisher: ACM Press 

Full text available: 1^ 1 pdf(668.40 KB) A^' 1 ' 0031 Information: full citation , abstract , references , citings, index 

! terms 

The proliferation of online information resources increases the importance of effective and 
efficient information retrieval in a multicollection environment. Multicollection searching is 
cast in three parts: collection selection (also referred to as database selection), query 
processing and results merging. In this work, we focus our attention on the evaluation of 
the first step, collection selection. In this article, we present a detailed discussion of the 
methodology that we used to evaluate an ... 

Keywords: Collection selection, database selection, distributed information retrieval, 
distributed text retrieval, metasearch engine, resource discovery, resource ranking, 
resource selection, server ranking, server selection, text retrieval 



12 Three-dimensional medical imaging: algorithms and computer systems 
M. R. Stytz, G. Frieder, O. Frieder 

December 1991 ACM Computing Surveys (CSUR), Volume 23 issue 4 
Publisher: ACM Press 

Full text available: ffl pdf(7.38 MB) Additional Information: full citation , references, citings, index terms , 
^ review 



Keywords: Computer graphics, medical imaging, surface rendering, three-dimensional 
imaging, volume rendering 



13 Modeling score distributions for combining the outputs of search engines 
R. Manmatha, T. Rath, F. Feng 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Full text available: 1£)pdf(236 39 KB) Additional Information: full citation , abstract, references , citings, index 
^ : terms 

In this paper the score distributions of a number of text search engines are modeled. It is 
shown empirically that the score distributions on a per query basis may be fitted using an 
exponential distribution for the set of non-relevant documents and a normal distribution 
for the set of relevant documents. Experiments show that this model fits TREC-3 and 
TREC-4 data for not only probabilistic search engines like INQUERY but also vector space 
search engines like SMART for English. We have als ... 

14 Research session: DB and IR #2: Shuffling a stacked deck: the case for partially 
randomized ranking of search engine results 

Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, Soumen Chakrabarti 
August 2005 Proceedings of the 31st international conference on Very large data 

bases VLDB '05 
Publisher: VLDB Endowment 
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Full text available: Q pdf( 193.25 KB) Additional Information: full citation , abstract , references , index terms 

In-degree, PageRank, number of visits and other measures of Web page popularity 
significantly influence the ranking of search results by modern search engines. The 
assumption is that popularity is closely correlated with quality, a more elusive concept 
that is difficult to measure directly. Unfortunately, the correlation between popularity and 
quality is very weak for newly-created pages that have yet to receive many visits and/or 
in-links. Worse, since discovery of new content is ... 

15 Usage analysis: Improving recommendation lists through topic diversification Q 
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen 
May 2005 Proceedings of the 14th international conference on World Wide Web 
Publisher: ACM Press 

Full text available: *g |pdf(298.18 KB) Additional Information: full citation , abstract , references , index terms 

In this work we present topic diversification, a novel method designed to balance and 
diversify personalized recommendation lists in order to reflect the user's complete 
spectrum of interests. Though being detrimental to average accuracy, we show that our 
method improves user satisfaction with recommendation lists, in particular for lists 
generated using the common item-based collaborative filtering algorithm. Our work builds 
upon prior research on recommender systems, looking at properties of re ... 

Keywords: accuracy, collaborative filtering, diversification, metrics, recommender 
systems 




16 Distributed Information Retrieval: Exploiting a controlled vocabulary to improve 
<§> collection selection and retrieval effectiveness 

^ James C. French, Allison L. Powell, Fredric Gey, Natalia Perelman 

October 2001 Proceedings of the tenth international conference on Information and 

knowledge management 
Publisher: ACM Press 

Full text available: ^pdf(1.47 MB) Additional Information: full citation , abstract , references , index terms 

Vocabulary incompatibilities arise when the terms used to index a document collection are 
largely unknown, or at least not well-known to the users who eventually search the 
collection. No matter how comprehensive or well-structured the indexing vocabulary, it is 
of little use if it is not used effectively in query formulation. This paper demonstrates that 
techniques for mapping user queries into the controlled indexing vocabulary have the 
potential to radically improve document retrieval perform ... 

17 Enhanced hypertext categorization using hyperlinks 
Soumen Chakrabarti, Byron Dom, Piotr Indyk 

V June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACM SIGMOD international 
conference on Management of data SIGMOD '98, volume 27 issue 2 
Publisher: ACM Press 

Full text available* f£) pdf(1 91 MB) Additional Information: full citation , abstract , references , citings , index 
k-' terms 

A major challenge in indexing unstructured hypertext databases is to automatically 
extract meta-data that enables structured search using topic taxonomies, circumvents 
keyword ambiguity, and improves the quality of search and profile-based routing and 
filtering. Therefore, an accurate classifier is an essential component of a hypertext 
database. Hyperlinks pose new problems not addressed in the extensive text classification 
literature. Links clearly contain high-quality semantic clues that ... 
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jj. Sunil Hadap, Dave Eberle, Pascal Volino, Ming C. Lin, Stephane Redon, Christer Ericson 
\p> August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: Q pdf(11.22 MB) Additional Information: full citation , abstract 

This course will primarily cover widely accepted and proved methodologies in collision 
detection. In addition more advanced or recent topics such as continuous collision 
detection, ADFs, and using graphics hardware will be introduced. When appropriate the 
methods discussed will be tied to familiar applications such as rigid body and cloth 
simulation, and will be compared. The course is a good overview for those developing 
applications in physically based modeling, VR, haptics, and robotics. 

19 Learning I: Mean version space: a new active learning method for content-based 
<|k image retrieval 

^ Jingrui He, Hanghang Tong, Mingjing Li, Hong-Jiang Zhang, Changshui Zhang 

October 2004 Proceedings of the 6th ACM SIGMM international workshop on 

Multimedia information retrieval 
Publisher: ACM Press 

Full text available: ^) pdf(360.62 KB) Additional Information: full citation , abstract , references , index terms 

In content-based image retrieval, relevance feedback has been introduced to narrow the 
gap between low-level image feature and high-level semantic concept. Furthermore, to 
speed up the convergence to the query concept, several active learning methods have 
been proposed instead of random sampling to select images for labeling by the user. In 
this paper, we propose a novel active learning method named mean version space, aiming 
to select the optimal image in each round of relevance feedback. Fi ... 

Keywords: active learning, content-based image retrieval, relevance feedback, version 
space 



20 Web search 3: Learning to estimate query difficulty: including applications to missing | 
content detection and distributed information retrieval 
Elad Yom-Tov, Shai Fine, David Carmel, Adam Darlow 

August 2005 Proceedings of the 28th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '05 

Publisher: ACM Press 

Full text available: |£] pdf(378.98 KB) Additional Information: full citation , abstract , references , index terms 

In this article we present novel learning methods for estimating the quality of results 
returned by a search engine in response to a query. Estimation is based on the agreement 
between the top results of the full query and the top results of its sub-queries. We 
demonstrate the usefulness of quality estimation for several applications, among them 
improvement of retrieval, detecting queries for which no relevant content exists in the 
document collection, and distributed information retrieval. Expe ... 

Keywords: query difficulty estimation 
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1 Research session: DB and IR #1 : An efficient and versatile query engine for TopX 
search 

Martin Theobald, Ralf Schenkel, Gerhard Weikum 

August 2005 Proceedings of the 31st international conference on Very large data 
bases VLDB '05 

Publisher: VLDB Endowment 

Full text available: ^ pdf(442.21 KB) Additional Information: full citation , abstract , references , index terms 

This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML 
documents over semistructured but nonschematic data collections. The algorithm follows 
the paradigm of threshold algorithms for top-k query processing with a focus on 
inexpensive sequential accesses to index lists and only a few judiciously scheduled 
random accesses. The difficulties in applying the existing top-k algorithms to XML data lie 
in 1) the need to consider scores for XML elements while aggreg ... 

2 Searching the Web 

^ August 2001 ACM Transactions on Internet Technology (TOIT), volume l issue i 
^ Publisher: ACM Press 



Full text available: 



Additional Information: full citation , abstract , references , citings , index 
terms , review 



We offer an overview of current Web search engine design. After introducing a generic 
search engine architecture, we examine each engine component in turn. We cover 
crawling, local Web page storage, indexing, and the use of link analysis for boosting 
search performance. The most common design and implementation techniques for each of 
these components are presented. For this presentation we draw from the literature and 
from our own experimental search engine testbed. Emphasis is on introduci ... 
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The proliferation of online information resources increases the importance of effective and 
efficient information retrieval in a multicollection environment. Multicollection searching is 
cast in three parts: collection selection (also referred to as database selection), query 
processing and results merging. In this work, we focus our attention on the evaluation of 
the first step, collection selection. In this article, we present a detailed discussion of the 
methodology that we used to evaluate an ... 
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Linkages among documents have a significant impact on the importance of documents, as 
it can be argued that important documents are pointed to by many documents or by other 
important documents. Metasearch engines can be used to facilitate ordinary users for 
retrieving information from multiple local sources (text databases). There is a search 
engine associated with each database. In a large-scale metasearch engine, the contents 
of each local database is represented by a representative. Each u ... 
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We study the process in which search engines with segmented indices serve queries. In 
particular, we investigate the number of result pages that search engines should prepare 
during the query processing phase. Search engine users have been observed to browse 
through very few pages of results for queries that they submit. This behavior of users 
suggests that prefetching many results upon processing an initial query is not efficient, 
since most of the prefetched results will not be requested by the ... 
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Recent increase in the number of search engines on the Web and the availability of meta 
search engines that can query multiple search engines makes it important to find effective 
methods for combining results coming from different sources. In this paper we introduce 
novel methods for reranking in a meta search environment based on expert agreement 
and contents of the snippets. We also introduce an objective way of evaluating different 
methods for ranking search results that is based upon implici ... 
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Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 
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In the fall of 1978 we decided to produce a special issue of the SIGART Newsletter 
devoted to a survey of current knowledge representation research. We felt that there 
were twe useful functions such an issue could serve. First, we hoped to elicit a clear 
picture of how people working in this subdiscipline understand knowledge representation 
research, to illuminate the issues on which current research is focused, and to catalogue 
what approaches and techniques are currently being developed. Secon ... 
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mechanical CAD, molecular biology, and computer graphics. As the number of 3D models 
grows, there is an increasing need for computer algorithms to help people find the 
interesting ones and discover relationships between them. Unfortunately, traditional text- 
based search techniques are not always effective for 3D models, especially when queries 
are geometric in nature (e.g., find me objects that fit into thi ... 
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When a query is submitted to a metasearch engine, decisions are made with respect to 
the underlying search engines to be used, what modifications will be made to the query, 
and how to score the results. These decisions are typically made by considering only the 
user's keyword query, neglecting the larger information need. Users with specific needs, 
such as "research papers" or "homepages," are not able to express these needs in a way 
that affects the decisions made b ... 
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Querying XML data is a well-explored topic with powerful database-style query languages 
such as XPath and XQuery set to become W3C standards. An equally compelling paradigm 
for querying XML documents is full-text search on textual content. In this paper, we study 
fundamental challenges that arise when we try to integrate these two querying 
paradigms. While keyword search is based on approximate matching, XPath has exact 
match semantics. We address this mismatch by considering queries on structure ... 
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Prior research under a variety of conditions has shown the CORI algorithm to be one of 
the most effective resource selection algorithms, but the range of database sizes studied 
was not large. This paper shows that the CORI algorithm does not do well in 
environments with a mix of "small" and "very large" databases. A new resource selection 
algorithm is proposed that uses information about database sizes as well as database 
contents. We also show how to acquire database size estimates in uncoopera ... 
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The papers in this special issue were compiled from responses to the announcement in 
the July 1984 issue of the SIGART newsletter and notices posted over the ARPAnet. The 
interest being shown in this area is reflected in the sixty papers received from over six 
countries. About half the papers were received over the computer network. 
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The dramatic growth of the Internet has created a new problem for users: location of the 
relevant sources of documents. This article presents a framework for (and experimentally 
analyzes a solution to) this problem, which we call the text-source discovery problem. Our 
approach consists of two phases. First, each text source exports its contents to a 
centralized service. Second, users present queries to the service, which returns an 
ordered list of promising text sources. T ... 
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A query to a web search engine usually consists of a list of keywords, to which the search 
engine responds with the best or "top" k pages for the query. This top-k query model is 
prevalent over multimedia collections in general, but also over plain relational data for 
certain applications. For example, consider a relation with information on available 
restaurants, including their location, price range for one diner, and overall food rating. A 
user who queries such a relation might ... 
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We introduce a method for learning to find documents on the Web that contain answers to 
a given natural language question. In our approach, questions are transformed into new 
queries aimed at maximizing the probability of retrieving answers from existing 
information retrieval systems. The method involves automatically learning phrase features 
for classifying questions into different types, automatically generating candidate query 
transformations from a training set of question/answer pairs, and ... 

Keywords: Web search, information retrieval, meta-search, query expansion, question 
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With the explosive growth of the World Wide Web, the public is gaining access to massive 
amounts of information. However, locating needed and relevant information remains a 
difficult task, whether the information is textual or visual. Text search engines have 
existed for some years now and have achieved a certain degree of success. However, 
despite the large number of images available on the Web, image search engines are still 
rare. In this article, we show that in order to allow people to profi ... 
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The unabated growth and increasing significance of the World Wide Web has resulted in a 
flurry of research activity to improve its capacity for serving information more effectively. 
But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" 
of Web resources and services. This observation points towards measurements and 
models that quantify various attributes of web sites. The science of measuring all aspects 
of information, especially its storage and retrieval or ... 
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